A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons

نویسندگان

  • Satoshi Sato
  • Sayoko Kaide
چکیده

This paper proposes a simple and fast person-name filter, which plays an important role in automatic compilation of a large bilingual person-name lexicon. This filter is based on pn score, which is the sum of two component scores, the score of the first name and that of the last name. Each score is calculated from two term sets: one is a dense set in which most of the members are person names; another is a baseline set that contains less person names. The pn score takes one of five values, {+2, +1, 0, −1, −2 }, which correspond to strong positive, positive, undecidable, negative, and strong negative, respectively. This pn score can be easily extended to bilingual pn score that takes one of nine values, by summing scores of two languages. Experimental results show that our method works well for monolingual person names in English and Japanese; the F-score of each language is 0.929 and 0.939, respectively. The performance of the bilingual person-name filter is better; the F-score is 0.955.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast and easy development of pronunciation lexicons for names

We show that a good approach for the grapheme-to-phoneme conversion of Dutch proper names (e.g. person names, toponyms, etc), is to use a cascade of a general purpose grapheme-to-phoneme (G2P) converter and a special purpose phoneme-to-phoneme (P2P) converter. The G2P produces an initial transcription that is then transformed by the P2P. The P2P is automatically trained on reference transcripti...

متن کامل

Multilingual person name recognition and transliteration

We present a tool that extracts person names from multilingual news collections and matches name variants referring to the same person. A novel feature is the matching of name variants across languages and writing systems, including names written with the Greek, Cyrillic and Arabic writing system. Due to our highly multilingual setting, we use an internal standard representation for name repres...

متن کامل

Automatic transcription error recovery for Person Name Recognition

Person Name Recognition from transcriptions of TV shows spoken content is a crucial step towards multimedia document indexing. Recognizing Person Names implies the combination of three main modules: Automatic Speech Recognition, NamedEntity Recognition and Entity Linking to associate the recognized surface form to a normalized Person Name. The three modules are potentially error prone. Hence, b...

متن کامل

Speaker Naming System by Associating Speech and Speaker Recognition Results

In this paper, we propose a system which can associate person names to individual speaker section. For this purpose, the automatic speaker segmentation is carried out utilizing online speaker modeling and speaker verification techniques. Key phrases and person names are also extracted by speech recognition. After this speaker segmentation and speech recognition, the person name is associated to...

متن کامل

Person Name Identification in Chinese Documents Using Finite State Automata

This research is about automatic identification and extraction of person names in Chinese text documents. Solutions to this problem have immediate and extensive applications in many areas especially in Web Intelligent Agents related applications such as Web search engines, Web data mining, and automatic Web information analysis. We have noted that while finite state automata (FSA) based techniq...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010